Advanced Techniques in Web Data Pre-processing and Cleaning

نویسندگان

  • Pablo E. Román
  • Juan D. Velásquez
چکیده

Central to successful e-business is the construction of web sites that attract users, capture user preferences, and entice them into making a purchase. Web mining is diverse data mining applied to categorize both the content and structure of web sites with the goal of aiding e-business. Web mining requires knowledge of the web site structure (hyperlink graph), the web content (vector model) and user sessions (the sequence of pages visited by each user to a site). Much of the data for web mining can be noisy. The origin of the noise comes from many sources, for example, undocumented changes to the web site structure and content, a different understanding of the text and media semantic, and web logs without individual user identification. There may not be any record of the number of times a specific page has been visited in a session as page is stored on a proxy or web browser cache. Such noise presents a challenge for web mining. This chapter presents issues with and approaches for cleaning web data in preparation for web mining analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 2 Web Usage Data Pre - processing

End users leave traces of behavior all over the Web all times. From the explicit or implicit feedback of a multimedia document or a comment in an online social network, to a simple click in a relevant link in a search engine result, the information that we as users pour into the Web defines its actual representation, which is independent for each user. Our usage can be represented by different ...

متن کامل

Pre-processing of Web Logs for Mining World Wide Web Browsing Patterns

Web usage mining is a type of web mining, which exploits data mining techniques to extract required information from navigational behaviour of WWW users. Hence the data must be preprocessed to improve the efficiency and ease of the mining process. So it is important to pre-process before applying data mining techniques to discover user access patterns from web logs. The main task of data pre-pr...

متن کامل

Preprocessing on Web Server Log Data for Web Usage Pattern Discovery

World Wide Web has gained popularity because of the fact that it acts as an effective communication medium between business and end users. Company needs to have a web site which satisfies the intended needs of their end users. Users like to revisit a web site which is usable in nature. Web usage patterns of end users must be identified to improve usability on any web site. It is done with analy...

متن کامل

An Efficient Algorithm for Data Cleaning of Web Logs with Spider Navigation Removal

The World Wide Web is growing massively larger with the exponential growth of websites providing the user with heaps of information. Text files called as web logs are used to store the clicks of a user whenever a user visits a website. Web usage mining is a stream of web mining that involves the applications of mining techniques to be applied on the server logs containing the user clickstreams....

متن کامل

A Survey of Preprocessing Method for Web Usage Mining Process

The amount of web applications are increasing in large amount and users of web applications are also increasing rapidly with high speed. By increasing number of users the size of log file also increases .The information which stores in log files cannot be directly used for analysis. Therefore preprocessing of log files is necessary to improve the quality of web usage mining process. Preprocessi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010